Goto

Collaborating Authors

 command line argument


Data-Prep-Kit: getting your data ready for LLM application development

Wood, David, Lublinsky, Boris, Roytman, Alexy, Singh, Shivdeep, Adam, Constantin, Adebayo, Abdulhamid, An, Sungeun, Chang, Yuan Chi, Dang, Xuan-Hong, Desai, Nirmit, Dolfi, Michele, Emami-Gohari, Hajar, Eres, Revital, Goto, Takuya, Joshi, Dhiraj, Koyfman, Yan, Nassar, Mohammad, Patel, Hima, Selvam, Paramesvaran, Shah, Yousaf, Surendran, Saptha, Tsuzuku, Daiki, Zerfos, Petros, Daijavad, Shahrokh

arXiv.org Artificial Intelligence

Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on a local machine or effortlessly scale to run on a cluster with thousands of CPU Cores. DPK comes with a highly scalable, yet extensible set of modules that transform natural language and code data. If the user needs additional transforms, they can be easily developed using extensive DPK support for transform creation. These modules can be used independently or pipelined to perform a series of operations. In this paper, we describe DPK architecture and show its performance from a small scale to a very large number of CPUs. The modules from DPK have been used for the preparation of Granite Models [1] [2]. We believe DPK is a valuable contribution to the AI community to easily prepare data to enhance the performance of their LLM models or to fine-tune models with Retrieval-Augmented Generation (RAG).


What's New in PyTorch 2.0? torch.compile - PyImageSearch

#artificialintelligence

Over the last few years, PyTorch has evolved as a popular and widely used framework for training deep neural networks (DNNs). The success of PyTorch is attributed to its simplicity, first-class Python integration, and imperative style of programming. Since the launch of PyTorch in 2017, it has strived for high performance and eager execution. It has provided some of the best abstractions for distributed training, data loading, and automatic differentiation. With continuous innovation from the PyTorch team, PyTorch has moved from version 1.0 to the most recent version, 1.13. However, over all these years, hardware accelerators like GPUs have become 15x and 2x faster in compute and memory access, respectively. Thus, to leverage these resources and deliver high-performance eager execution, the team moved substantial parts of PyTorch internals to C .


Command line arguments for your Python script

#artificialintelligence

Working on a machine learning project means we need to experiment. Having a way to configure your script easily will help you move faster. In Python, we have a way to adapt the code from command line. In this tutorial, we are going to see how we can leverage the command line arguments to a Python script to help you work better in your machine learning project. There are many ways to run a Python script.


Training GANs in Julia's Flux

#artificialintelligence

In order to effectively run machine learning experiments we need a fast turn-around time for model training. So simply implementing the model is not the only thing we need to worry about. We also want to be able to change the hyperparameters in a convenient way. This could either be through a configuration file or through command line arguments. This post demonstrates how I train a vanilla GAN on the MNIST dataset. It is not about GAN theory, for this the original paper by Goodfellow et al. [[1]] is a good starting point. Instead I focus on how to structure the code and subtle implementation issues I came across when writing the code. You can find the current version of the code on github.


OpenCV Face detection with Haar cascades - PyImageSearch

#artificialintelligence

In this tutorial, you will learn how to perform face detection with OpenCV and Haar cascades. I've been an avid reader for PyImageSearch for the last three years, thanks for all the blog posts! My company does a lot of face application work, including face detection, recognition, etc. We just started a new project using embedded hardware. I don't have the luxury of using OpenCV's deep learning face detector which you covered before, it's just too slow on my devices.


conjugateprior - Content Analysis in Python

#artificialintelligence

This page is currently not much more than an extended advertisment for doing content analysis in Python. In time it might expand to a full tutorial, should anyone express interest in reading one. The scripts presented here are not intended to teach programming; I assume you have at least a vague idea about that already. Nor are they intended to exemplify fine coding style. The point is to show how easy things can be, if you pick the right tools. The concept of identity seems to be all the rage now in the social sciences.



Autoencoders for Content-based Image Retrieval with Keras and TensorFlow - PyImageSearch

#artificialintelligence

In this tutorial, you will learn how to use convolutional autoencoders to create a Content-based Image Retrieval system (i.e., image search engine) using Keras and TensorFlow. The tutorials were a big hit; however, one topic I did not touch on was Content-based Image Retrieval (CBIR), which is really just a fancy academic word for image search engines. Image search engines are similar to text search engines, only instead of presenting the search engine with a text query, you instead provide an image query -- the image search engine then finds all visually similar/relevant images in its database and returns them to you (just as a text search engine would return links to articles, blog posts, etc.). I'll show you how to implement each of these phases in this tutorial, leaving you with a fully functioning autoencoder and image retrieval system. To learn how to use autoencoders for image retrieval with Keras and TensorFlow, just keep reading!


Raspberry Pi and Movidius NCS Face Recognition - PyImageSearch

#artificialintelligence

One and two are pre-trained deep learning models, meaning that they are provided to you as-is by OpenCV. The Movidius NCS will perform inference using each of these models. The third recognizer model is not a form of deep learning. Rather, it is our SVM machine learning face recognition model. The RPi CPU will have to handle making face recognition predictions using it. We also load our label encoder which holds the names of the people our model can recognize (Line 42). Let's initialize our video stream: Line 47 initializes and starts our VideoStream object. We wait for the camera sensor to warm up on Line 48. Line 51 initializes our FPS counter for benchmarking purposes.


Detecting Natural Disasters with Keras and Deep Learning - PyImageSearch

#artificialintelligence

In this tutorial, you will learn how to automatically detect natural disasters (earthquakes, floods, wildfires, cyclones/hurricanes) with up to 95% accuracy using Keras, Computer Vision, and Deep Learning. I remember the first time I ever experienced a natural disaster -- I was just a kid in kindergarten, no more than 6-7 years old. We were outside for recess, playing on the jungle gym, running around like the wild animals that young children are. Rain was in the forecast. My mother had given me a coat to wear outside, but I was hot and unconformable -- the humidity made the cotton/polyester blend stick to my skin.